26 research outputs found
Distributed dynamic reinforcement of efficient outcomes in multiagent coordination and network formation
We analyze reinforcement learning under so-called “dynamic reinforcement”. In reinforcement learning, each agentrepeatedly interacts with an unknown environment (i.e., other agents), receives a reward, and updates the probabilities of its next action based on its own previous actions and received rewards. Unlike standard reinforcement learning, dynamic reinforcement uses a combination of long term rewards and recent rewards to construct myopically forward looking action selection probabilities. We analyze the long term stability of the learning dynamics for general games with pure strategy Nash equilibria and specialize the results for coordination games and distributed network formation. In this class of problems, more than one stable equilibrium (i.e., coordination configuration) may exist. We demonstrate equilibrium selection under dynamic reinforcement. In particular, we show how a single agent is able to destabilize an equilibrium in favor of another by appropriately adjusting its dynamic reinforcement parameters. We contrast the conclusions with prior game theoretic results according to which the risk dominant equilibrium is the only robust equilibrium when agents ’ decisions are subject to small randomized perturbations. The analysis throughout is based on the ODE method for stochastic approximations, where a special form of perturbation in the learning dynamics allows for analyzing its behavior at the boundary points of the state space
Design and Implementation of Distributed Resource Management for Time Sensitive Applications
In this paper, we address distributed convergence to fair allocations of CPU
resources for time-sensitive applications. We propose a novel resource
management framework where a centralized objective for fair allocations is
decomposed into a pair of performance-driven recursive processes for updating:
(a) the allocation of computing bandwidth to the applications (resource
adaptation), executed by the resource manager, and (b) the service level of
each application (service-level adaptation), executed by each application
independently. We provide conditions under which the distributed recursive
scheme exhibits convergence to solutions of the centralized objective (i.e.,
fair allocations). Contrary to prior work on centralized optimization schemes,
the proposed framework exhibits adaptivity and robustness to changes both in
the number and nature of applications, while it assumes minimum information
available to both applications and the resource manager. We finally validate
our framework with simulations using the TrueTime toolbox in MATLAB/Simulink
Deep Residual Policy Reinforcement Learning as a Corrective Term in Process Control for Alarm Reduction: A Preliminary Report
Conventional process controllers (such as proportional integral derivative controllers and model predictive controllers) are simple and effective once they have been calibrated for a given system. However, it is difficult and costly to re-tune these controllers if the system deviates from its normal conditions and starts to deteriorate. Recently, reinforcement learning has shown a significant improvement in learning process control policies through direct interaction with a system, without the need of a process model or the system characteristics, as it learns the optimal control by interacting with the environment directly. However, developing such a black-box system is a challenge when the system is complex and it may not be possible to capture the complete dynamics of the system with just a single reinforcement learning agent. Therefore, in this paper, we propose a simple architecture that does not replace the conventional proportional integral derivative controllers but instead augments the control input to the system with a reinforcement learning agent. The agent adds a correction factor to the output provided by the conventional controller to maintain optimal process control even when the system is not operating under its normal condition
Hierarchical Framework for Interpretable and Probabilistic Model-Based Safe Reinforcement Learning
The difficulty of identifying the physical model of complex systems has led
to exploring methods that do not rely on such complex modeling of the systems.
Deep reinforcement learning has been the pioneer for solving this problem
without the need for relying on the physical model of complex systems by just
interacting with it. However, it uses a black-box learning approach that makes
it difficult to be applied within real-world and safety-critical systems
without providing explanations of the actions derived by the model.
Furthermore, an open research question in deep reinforcement learning is how to
focus the policy learning of critical decisions within a sparse domain. This
paper proposes a novel approach for the use of deep reinforcement learning in
safety-critical systems. It combines the advantages of probabilistic modeling
and reinforcement learning with the added benefits of interpretability and
works in collaboration and synchronization with conventional decision-making
strategies. The BC-SRLA is activated in specific situations which are
identified autonomously through the fused information of probabilistic model
and reinforcement learning, such as abnormal conditions or when the system is
near-to-failure. Further, it is initialized with a baseline policy using policy
cloning to allow minimum interactions with the environment to address the
challenges associated with using RL in safety-critical industries. The
effectiveness of the BC-SRLA is demonstrated through a case study in
maintenance applied to turbofan engines, where it shows superior performance to
the prior art and other baselines.Comment: arXiv admin note: text overlap with arXiv:2206.1343
Specialized Deep Residual Policy Safe Reinforcement Learning-Based Controller for Complex and Continuous State-Action Spaces
Traditional controllers have limitations as they rely on prior knowledge
about the physics of the problem, require modeling of dynamics, and struggle to
adapt to abnormal situations. Deep reinforcement learning has the potential to
address these problems by learning optimal control policies through exploration
in an environment. For safety-critical environments, it is impractical to
explore randomly, and replacing conventional controllers with black-box models
is also undesirable. Also, it is expensive in continuous state and action
spaces, unless the search space is constrained. To address these challenges we
propose a specialized deep residual policy safe reinforcement learning with a
cycle of learning approach adapted for complex and continuous state-action
spaces. Residual policy learning allows learning a hybrid control architecture
where the reinforcement learning agent acts in synchronous collaboration with
the conventional controller. The cycle of learning initiates the policy through
the expert trajectory and guides the exploration around it. Further, the
specialization through the input-output hidden Markov model helps to optimize
policy that lies within the region of interest (such as abnormality), where the
reinforcement learning agent is required and is activated. The proposed
solution is validated on the Tennessee Eastman process control
Interpretable Input-Output Hidden Markov Model-Based Deep Reinforcement Learning for the Predictive Maintenance of Turbofan Engines
An open research question in deep reinforcement learning is how to focus the policy learning of key decisions within a sparse domain. This paper emphasizes on combining the advantages of input-output hidden Markov models and reinforcement learning. We propose a novel hierarchical modeling methodology that, at a high level, detects and interprets the root cause of a failure as well as the health degradation of the turbofan engine, while at a low level, provides the optimal replacement policy. This approach outperforms baseline deep reinforcement learning (DRL) models and has performance comparable to that of a state-of-the-art reinforcement learning system while being more interpretable